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Abstract. This paper introduces an algorithm for the nonnegative matrix 
factorization-and-completion problem, which aims to find nonnegative low-rank 
matrices X and Y so that the product XY approximates a nonnegative data 

Q matrix M whose elements are partially known (to a certain accuracy). This 

problem aggregates two existing problems: (i) nonnegative matrix factorization 
where all entries of M are given, and (ii) low-rank matrix completion where non- 
negativity is not required. By taking the advantages of both nonnegativity and 
low-rankness, one can generally obtain superior results than those of just using 
one of the two properties. We propose to solve the non-convex constrained 
least-squares problem using an algorithm based on the classic alternating di- 
rection augmented Lagrangian method. Preliminary convergence properties of 
the algorithm and numerical simulation results are presented. Compared to a 
recent algorithm for nonnegative matrix factorization, the proposed algorithm 
produces factorizations of similar quality using only about half of the matrix 
entries. On tasks of recovering incomplete grayscale and hyperspectral images, 

t-h the proposed algorithm yields overall better qualities than those produced by 

two recent matrix-completion algorithms that do not exploit nonnegativity. 

CO Keywords, nonnegative matrix factorization, matrix completion, alternat- 

ing direction methd, hyperspectral unmixing 
MSC. 15A83, 65F30, 90C26, 90C90, 94A08 
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1 Introduction 

Ctf This paper introduces an algorithm for the following problem: 

Definition 1 (Nonnegative matrix factorization / completion (NMFC)). Given 
samples M{j, G f) C {l,...,m} x {]_,..., n} ; of a nonnegative rank-r 

matrix M <E R mXn ; find nonnegative matrices X <G R mX ^ and Y <E R qXn such 
that || M — XY\\p is minimized. 

Note that q is not necessarily set to equal r. Firstly, not all rank-r non- 
negative matrices have nonnegative factors of size r. For some of them, the 
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available size of nonnegative factors is strictly greater than r. Secondly, when 
M is approximately low-rank, i.e. the singular values of M have a fast-decaying 
distribution, one often sets q to be the estimated rank or the number of signifi- 
cant singular values. This resulting problem can be called approximate NMFC. 
In general, depending on data and applications, q can be either equal, less than, 
or greater than r. 

NMFC is a combination of nonnegative matrix factorization (NMF) — 
which finds nonnegative factors of a nonnegative matrix given all of its en- 
tries — and low-rank matrix completion (LRMC) — which recovers M from an 
incomplete set of its entries without assuming nonnegativity. Mathematically, 
given a matrix M E R mXn and q > 0, we present the three problems with the 
following models 

NMFC: t? \\\^XY-M)\\l: ^ >V« > 0," i, i } • (1) 
NMF: {txr - Mf F : '^UvW } ' <*> 
LRMC: mm jrank(Z) : ^ _ ^ = Q | , (3) 

where indexes the known entries of M and ^q(A) returns a copy of A that 
zeros out the entries not in Note that each of the three problems has other 
models. Examples include weighted least-squares for NMF and NMFC and 
nuclear- norm minimization for LRMC. While ([!]) and ^ return XY up to a 
fixed rank q, ^ seeks for a least-rank recovery Z. It is well known that models 
(1)-([3|) are non-convex and generally difficult to solve. A recent advance for 
(3) is that if M is low-rank and the samples ft satisfy the so-called incoherence 
property and are sufficiently large, then a convex problem based on nuclear 
norm minimization can exactly recover M (see the pioneering work [10] , as well 
as recent results [26| l6| l5| 17]). 

We are interested in NMFC since it complements NMF and LRMC. NMF 
has been widely used in data mining such as text mining, dimension reduction 
and clustering, as well as spectral data analysis. It started to appear in [2U 
\22\ [23] and has become popular since the publication of [18J in 1999. More 
information on NMF can be found in the survey paper PQ, as well as books 
[Hl|9]. Unlike NMF, NMFC assumes that the underlying matrix is incompletely 
sampled; hence, it leads to saving of sampling time and storage (for data such as 
images) and has broader applicability. On the other hand, LRMC has recently 
found a large number of applications including collaborative filtering, which is 
used by Netflix to infer individual preference from an incomplete set of user 
preferences [13], global positioning, which discovers the positions of nodes in a 
network from incomplete pair- wise distances [3], system identification and order 
reduction, which recovers or reduces the dimension of the state vectors of a linear 
time-invariant state-space model [20], as well as the background subtraction and 
structure-from-motion problems in computer vision. A rank-g matrix M can 
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be written as M = XY for matrices X with q columns and Y with q rows. 
When X and Y are known to be nonnegative a priori, empirical evidence given 
in Section [3] shows that imposing nonnegativity on the factors improves the 
recovery quality. In particular, in certain applications such as hyperspectral 
unmixing, the factors are nonnegative due to their physical nature, so these 
applications will benefit from NMFC. To summarize, NMFC combines NMF 
and LRMC, and NMFC is useful when the underlying matrix has both low 
rank and nonnegative factors. 

1.1 Related Algorithms 

There are two algorithms that have been widely used for NMF: the alternating 
least squares (ALS) in [24J and multiplicative updating (Mult) in [19] . The 
former algorithm alternatively updates factor matrices X and Y to reduce the 
least-squares cost \\XY — M\\ 2 F . The closed- form updates are given as 

X new <- max{o,My T (yy T )t}, 

y new <- max{0, (X T X) t X T M}, 

where max{-, •} is applied component- wise and f denotes pseudo-inverse. The 
algorithm Mult has much cheaper multiplicative updates 

(Xnew)y <- X ij (MY T ) ij /(XYY T + e)ij, Vi,j, 
(Yue^ij <- Y ij (X T M) ij /(X T XY + e) ij , V 

which do not involve matrix inversion. Starting from a nonnegative initial 
matrix Y , X and Y remain nonnegative during the iterations of Mult. The 
algorithm presented in this paper also applies to NMF if a complete sample 
set ft is used. The resulting algorithm, which has been studied in paper [33] , 
is simpler and compares favorably with ALS and Mult in terms of both speed 
and solution quality. In fact, the proposed algorithm in this paper extends the 
work in [34J, and both algorithms are based on the algorithm of alternating 
direction method of multipliers (ADM) [121 HD El EEl E21 [30J . Likewise, we 
can extend the algorithms ALS and Mult to solving NMFC. Extending ALS 
is as straightforward as adopting the least-square cost \\&q(XY — M)|||, and 
deriving the corresponding updates. One simple approach to extend Mult is to 
replace M by M E R mXn , defined component-wise by = M^-l^j)^, i.e. 

M is a copy of M with the unsampled entries set to 0. Drawing conclusions 
based on the comparative results in [5%] . we believe that ADM based methods 
deliver higher- quality solutions in shorter times. 

There are also several algorithms for LRMC. Since LRMC can complete 
a matrix and return factors that happen to be (approximately) nonnegative, 
we shall briefly review a few well-known LRMC algorithms and compare them 
to the proposed algorithm. Singular value thresholding (SVT) [I] and fixed- 
point shrinkage (FPCA) [21] are two well-known algorithms. SVT applies the 
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linearized Bremgan iterations [33] to the unconstrained nuclear-norm model of 
LRMC: 

min A||Z|| # + (l/2)||^ n (Z - M)f F . (4) 

FPCA solves the same model using iterations based on an iterative shrinkage- 
thresholding algorithm [16J. Furthermore, classic alternating direction aug- 
mented Lagrangian methods have been applied to solving Q or its variant 
with constraints ^n(Z - M) = in [Til I5T]. The algorithm LMaFit [29J uses 
a different model: 

mm{\\XY - Z\\ F : 3» a {Z - M) = 0}. (5) 

The model is solved by a nonlinear successive over-relaxation algorithm [T5] . 
In section 3, we compare the proposed algorithm to FPCA and LMaFit and 
demonstrate the benefits of taking advantages of factor nonnegativity. 

1.2 Organization 

The rest of this paper is organized as follows. Section 2 reviews the ADM 
algorithm and presents an ADM-based algorithm for NMFC. A preliminary 
convergence result of this algorithm is given in Section 2.3. Section 3 presents 
the results of numerical simulations, which perform tasks such as decomposing 
nonnegative matrices, compressing grayscale images, as well as recovering three- 
dimensional hyperspectral cubes from incomplete samples. Finally, Section 4 
concludes this paper. 

2 Algorithm and Convergence 
2.1 Background: the ADM approach 

In a finite-dimensional setting, the classic alternating direction method (ADM) 
solves structured convex programs in the form of 

min f(x) + g(y), s.t. Ax + By = c, (6) 

where / and g are convex functions defined on closed subsets ^ and <3f of a 
finite-dimensional space, respectively, and A, B and c are matrices and vector 
of appropriate sizes. The augmented Lagrangian of (§ is 

J^ A (x, y, A) = f(x) + g(y) + X T (Ax + By - c) + ^\\Ax + By - c\\l 

where A is a Lagrangian multiplier vector and (3 > is a penalty parameter. 

The classic alternating direction method is an extension of the augmented 
Lagrangian multiplier method [T71 |25l |27] . It performs minimization with re- 
spect to x and y alternatively, followed by the update of A; that is, at iteration 
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x k+1 <- argminJ^O^jA X k ), (7a) 
y k+1 <- argmin^ 4 (x /c+1 ,y, \ k ), (7b) 



where 7 E (0,1.618) is a step length. While (7a) only involves f(x) in the 
objective and ( p7b| ) only involves the classic augmented Lagrangian method 
requires a minimization of Jzf^x, A fc ) with respect to x and y jointly, i.e., 
replacing (7a) and (l7bj) by 



{x k+1 ,y k+1 ) <- argmin 3£ A {x,y, X k ). 

As the minimization couples f(x) and g(y), it can be much more difficult than 
(7a) and (7b). 

2.2 Main Algorithm 

To facilitate an efficient use of ADM, we consider an equivalent form of ([!]): 

min (E/,v,x,y,z) l\\ XY ~ z \\f 
s.t. X = U,Y = V, 

U>0,V>0, {) 

&n(Z-M) = 0, 

where X, U E R mXq and Y, F E M gXn . The augmented Lagrangian of @ is 
Jf A (X,Y,Z,U,V,A,IL) = ±\\XY-Z\\ 2 F + Am(X- U) 

+n • (Y - V) + - U\\ 2 F + £\\Y - V\\ 2 F , 

where A E M mxg , II E R gXn are Lagrangian multipliers, a, (3 > are penalty 
parameters, and A%B := J2i j a ij^ij f° r matrices A and £> of the same size. We 
deliberately leave &q(Z — M) = in the constraints instead of relaxing them, 
so only those entries of Z not in are free variables. 

The alternating direction method for ([§]) is derived by successively minimiz- 
ing J^a with respect to X, Y, Z, [/, V , one at a time while fixing others at their 
most recent values, i.e., 

X k+1 = argmin ££ A {X, Y k , Z k , U k , V k , A k , U k ), 
Y k+1 = argmin 3? A {X k +\, Y, Z k , U k , V k , A k , IL k ), 

Z k+ i = argmin Jf A (X k+1 , Y k + U Z, U k , V k , A k , U k ), 

&> n (Z-M)=0 

U k +i = axgmmJf A (X k +i,Y k +i,Z k +i, U, V k , A k , U k ), 

u>o 

V k +\ = arg minJf A (X k+1 , Y k+1 , Z k+1 , E/fc+i, V,A k ,U k ), 
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and then updating the multipliers A and EL. Specifically, these steps can be 
written in closed form as 



Xk+i 


= (Z k Y k T + aU k - A k )(Y k Y? + al)~\ 


(9a) 


Y k+ i 


= {xl +1 x k+1 + pi)~\xl +1 z k + pv k - U k ), 


(9b) 


Zk+l 


= X k+1 Y k+1 + &> n (M - X k+1 Y k+1 ), 


(9c) 


Uk+i 


= 0> + {X k+l + A k /a), 


(9d) 


v k+1 


= ^ + (Y k+1 +U k /p), 


(9e) 




= A k + ^a(X k+1 - U k+1 ), 


(9f) 


Ilfc+i 


= U k + 7 /3(Y k+1 -V k+1 ), 


(9g) 



where 7 E (0, 1.618) and (&+(A))ij — max{a^-,0}. Since matrix inversions are 
applied to q x q matrices, they are relatively inexpensive for q < min{m,n}. 

2.3 Convergence 

Global convergence can be obtained when the classic ADM is applied to two- 
block convex programs in the form of Q. However, to the best of our knowl- 
edge, there is no global convergence result in general for non-convex programs 
or convex programs with three or more blocks. Note that problem Q is non- 
convex and there are three blocks in updates Q. Due to these difficulties, we 
provide a convergence property of the proposed ADM algorithm that holds only 
under some assumptions. 

A point (X, Y, Z, U, V) satisfies the KKT condition for problem Q if there 
exist A and II such that 

(XY - Z)Y T + A = 0, (10a) 

x T (xy-z) + n = o, (10b) 

^nc(XY-Z) = 0, (10c) 

^n(Z-M) = 0, (lOd) 

X-U = 0, (lOe) 

Y - V = 0, (lOf) 

A<0<[/,A©[/ = 0, (lOg) 

n<o<y,n©y = o, (ioh) 

where £l c indexes the unobserved entries of M, and denotes component-wise 
multiplication. To simplify notation, we consolidate all the variables in problem 
^ as W := (X, y, Z, U, V), and write 3f A (X) to represent la grangian function 
with respect to X by fixing others at their most recent values. 

Theorem 2.1. Let {(W&, Uk)} be a sequence generated by the ADM algo- 
rithm Q. If the multiplier sequence {(A&,n&)} is bounded and satisfies 

00 

J2 (l|Afc+i - AfeHl + ||n fc+ i - u k \\%) < to. (11) 

k=0 
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Then any accumulation point of {Wk} satisfies the KKT condition for problem 
Q. Consequently, any accumulation point of {(X^,!^)} satisfies the KKT 
condition for problem ([!]). 

Proof. First, we claim Wk+i — Wk — > 0, and (Afc+i, Ilfc+i) - (A&,n&) — > 0. We 
begin the proof of this claim by observing that A, n) is bounded below. 

This follows from 

^ A (W,A,U) = ±\\XY-Z\\ F + ^\\X-U + A/a\\ F -±\\A\\ F 
+^\\Y-V + U/pf F -^\\Uf F , 

and the boundedness of {(A, II)}. Furthermore, the lagrangian function Jzf^ 
is strongly convex with respect to each variable of X, Y, Z, U and V. For X- 
variable, it holds for any X and AX that 

& A (X + AX) - & A (X) > d x ^ A (X) T AX + a\\AX\\ 2 F . (12) 

In addition, X* is a minimizer of jSf^(X) implies the inequality 

M(I*) T AI > 0. (13) 



Combining (12) and (13) and observing X&+i is a minimizer of jSf^(X) at the 
fc-th iteration, we have 

& A {X k ) - & A {X k+1 ) > a\\X k - X k+1 f F , (14) 

and in the same way, 

jsf A (n) - ^(n+i) > - n+iin, (i5a) 

jSf A (^fc) - J^fc+l) > ll^fc " 3fc+l||F, (15b) 

^ A (C/ fc ) - Jgf A (J/fc+i) > a\\U k - U k+1 \\ 2 F , (15c) 
S? A (V k ) - S? A (V k+1 ) > P\\V k - V k+1 \\ 2 F . (15d) 

Let c := min{o:,/3, 1}. Then by ( Jl4] ) and ( fl5| ), we have 

j2fA(w fc , A fc , n fc ) - ^(Wfc+i, Afc+i, n fc+1 ) 

=J2fA(W fc , A fc , n fc ) - & A (W k+1 ,A k , U k ) 
+ Sf A (W k+1 ,A k , U k ) - Sf A (W k+1 , Afc+i, n fc+ i) 

> c ||w fc - w fc+ i|||. - — 1| Afc - Afc +1 ||| - -Un* - n fc+ i|||. 

70: 7p 

>c||Wfc - Wjfe+iHj. - — (II Afc - Afc +1 ||| + \\U k - U k+1 \\ 2 F ) . 

C7 
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Taking summation of the above inequality and recalling J/f^VK, A, II) is bounded 
below, we get 

CO CO j 

J2 c \\w k - w k+1 \\ 2 F - — (IIAjfe - Afc+iiil + ||n fe - n fc+1 |^) < oo. 



/c=0 



/c=0 



Since the second term on the left of the above inequality is bounded, it follows 
that 



J2 c \\W k -W k+1 \\ 2 F <oc, 



k=0 



from which we immediately have W k+ i~W k — >• 0. For (A^+i, Ilfc+i) — (A&, life) — >> 
0, it directly follows from (11). 



Now, we are ready to prove the result of this theorem. First, rearrange the 
ADM formulas in Q into 



(X fc+ i - X k )(Y k Y k T + al) 


= -{{X k Y k -Z k )Y k T 


(16a) 




+a(X k -U k ) + A k ), 




(X{ +1 X k+1 + f3I)(Y k+1 -Y k ) 


= -(X k+1 (X k+1 Y k - Z k ) 


(16b) 




+P(Y k -V k ) + U k ), 




U k +i - u k 


= &+(X k+1 +A k /a)-U k , 


(16c) 


V k+ i - v k 


= ^+(Y k+1 +U k /p)-V k , 


(16d) 


Afc+i - A fc 


= lo:(X k+ i - U k+ i), 


(16e) 


Ilfc+i - Yl k 


= j/3(Y k+1 -V k+1 ), 


(16f) 



and 



Z k+l = X k+l Y k+1 + & n (M - X k+l Y k+l ). (17) 

Ii k imply that the left- 



Note W k +i - W k -> 0, A k+1 -A k ^0 and U k+1 
and right-hand sides in (16) all go to zero, i.e., 



(X k Y k - Z k )Y k T + A k 
Xl{X k Y k - Z k ) + n fc ) 
& + (X k + A k /a)-U k 

&>+(Y k + n k /0)-v k 
x k - u k 

Y k -V k 



— > 
— > 
— > 
— > 
— > 



o, 
o, 
o, 
o, 
o, 
o, 



(18a) 
(18b) 
(18c) 
(18d) 
(18e) 
(18f) 



where the terms a(X k — U k ) and f3(Y k — V k ) have been eliminated in ( |18a ) 
and ( |18b[ ), respectively, by invoking ( |18e[ ) and (18f). For any limit point 
W = (X,Y, Z,U,V) of sequence there exists subsequence {W n J con- 

verging to W. The boundedness of {(A k ,U k )} implies the existence of a sub- 
subsequence {(A nfc ,n nfc )} of {(A nfc ,n n J} converging to some point (A, II). 
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Hence, (W^,A,ft) is a limit point of sequence {(W&, 11^)}. Since (17) ex- 
actly means 

& Q (Z k — M) = 0, and &n(X k Y k - Z k ) = 0, 



then clearly, the first six equations in the KKT conditions ( |T0| ) are satisfied at 
the limit point {W , A, II). The nonnegativity of U and V are guaranteed by the 
algorithm construction. Therefore, we only need to verify the non-positivity 
of A and ft, and the complementarity between U and A, and between V and 



II. Now we examine the following two equations derived from (18c) and (18d), 
respectively, 

+ A/a) = U, (19a) 
+ IL/P) = V. (19b) 



Note we have X = U > 0. If = X^ = 0, then (19a) reduces &> + (h/a)ij = 0, 



which implies A^- < 0. On the other hand, if Uij = Xij > 0, then ( 19a| ) implies 



Aij = 0. This proves the non-positivity of A and the complementarity between 



U and A. The same argument can be applied to (19b), due to the identical 
structure, to prove the non-positivity of ft and the complementarity between 
V and ft. 

We have verified the statement concerning the sequence {W&} and problem 
Q. The statement concerning the sequence and problem follows 

directly from the equivalence between the two problems. This completes the 
proof. □ 



From the proof of Theorem |2.1[ we can immediately get the following corol- 
lary. 

Corollary 2.1. Let {(Wfc,Afc,IIfc)} be a sequence generated by the ADM al- 
gorithm Q. Whenever the sequence converges, the limit satisfies the KKT 
conditions. 



3 Numerical Results 

3.1 Implementation and Parameters 

A pseudo code for the proposed algorithm is given in Algorithm [T] below. 

The most important parameters are a, (3 and 7. In our implementation, we 
set 7 = 1.618, and f3 = na/m. The setting j3 — na/m considers the different 
sizes of X and Y and balances the penalties for constraints X — U and Y — V . 
The naive setting a = (3 also works for our tests but reduces the speed of 
convergence. By running a range of numerical experiments, we heuristically 
scale A so that ||^4||f = 2.5 x 10 5 and select a = 2.0 x 10 _4 ||A||i? max(m, n)/q. 
They have worked well for our tested matrices, and it is worth mentioning 
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Algorithm 1 ADM-based algorithm for NMFC 

Input A = &n(M) G K mxn , integer q > 0, maxiter > 0, and tol > 0. 

Set a, (3, 7 > 0. Set Y as a nonnegative random matrix, Z = A, and U, V, A, II 

as zero matrices of appropriate sizes. 

for k = 1, . . . , maxiter do 

Update (X k , Y k , Z k , U k , V k , A k , U k ) by the formulas @; 

if a stopping criterion is met then 
exit and output (X k ,Y k ) 

end if 
end for 



that algorithm [T] can work well for different a, (3 in a fairly large interval. The 
iteration stops once either one of the following conditions is met: 

lfk+ j~/A < tol, (20a) 
max(l, \f k \) 

fk < tol, (20b) 

where f k = \\^n(X k Y k — A)\\f/\\A\\f- All tests were performed on a Lenovo 
T410 laptop with an i7-620m CPU and 3 gigabytes of memory and running 
32-bit Windows 7 and MATLAB 2010b. 



3.2 Random Nonnegative Matrices Factorization 

We compared the algorithm proposed in [31] with the proposed algorithm [TJ 
where the former algorithm takes complete samples of a random matrix M 
while the latter algorithm takes 75%, 50%, and 25% samples of the same matrix 
M. While other reported tests in this paper used parameters and stopping rules 
given above, this test set used different but consistent parameters (which are not 
optimal for algorithm 1) for both algorithms in order to accurately reveal their 
performance difference and the difference between NMF and NMFC: a = (3 = 
10 4 and tol = 10 -6 . We generated each rank-r nonnegative matrix M E R mXn 
in the form of M = LDR, where L <E R mXr and R E W Xn were generated 
by calling MATLAB 's command rand and D is an r x r diagonal matrix with 
diagonal elements 1,2, ...,r. Such scaling makes M slightly ill-conditioned. 
We tested different combinations of n and m and obtained roughly consistent 
results. Figure [l] depicts the recovery qualities and speeds corresponding to 
m — n — 500 and varying q — r — 20 through 50. The results are the averages 
of 50 independent trials. 

The quality of recovery is similar for SR = 100%, 75%, and 50% for the set 
of tested matrices. They are all faithful recoveries with relative errors around 
0.4%. The relative errors for SR = 75%, and 50% are just slightly worse. The 
low SR = 25% makes the recovery more difficult. When the ranks r are between 
20 and 30, the four error curves are roughly parallel though the red curve (SR 
= 25%) is worse at relative errors around 0.6%. When r > 30, 25% of entries 
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Figure 1: Matrix completion with different sample rates (SRs). Left: relative error in Frobe- 
nious norm; Right: cpu time in seconds. The algorithm in 34 was used for 
SR=100%. Algorithm [l] was used for SR=70%, 50%, 25%. All "tests used the 
same parameters and stopping tolerances, and results are the averages over 50 in- 
dependent trials 



seem no longer enough for faithful recovery and consequently, the red curve 
(SR = 25%) begins to deviate from the others as r increases, and it exhibits a 
steep upward trend. The difficulty with SR = 25% samples for large r is also 
shown in terms of cpu seconds. The times for SR = 75% and 50% are about 
three times as long as those for SR = 100%. Since the times are the averages 
of merely 50 trials, the curves are not as smooth as they would be if the trials 
were much more. 

The large gap between the red curve in Figure [TJ^left) and the other curves is 
largely due to the use of the same stopping tolerance 10 -6 . However, SR=25% 
can reach the similar accuracy of higher SRs if it has a tighter tolerance (e.g., 
10 -7 ) and runs more iterations, at least for r < 30. In this sense, lower SRs do 
not necessarily mean much larger errors. 



3.3 Overview of Algorithm LMaFit and FPCA 

Before more simulation results are presented, let us overview LMaFit and 
FPCA, which were compared to Algorithm [T] in the next two simulations. 
LMaFit solves ^ based on a nonlinear successive over-relaxation (SOR) method. 
From its first-order optimality conditions 



(XY - Z)Y T = 0, 
X T (XY -Z) = 0, 
&>nc(Z-XY) = 0, 
&>n(Z-M) = 0. 
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the nonlinear SOR scheme is derived as 

X k+1 = Z k Y?(Y k Y£)i, 
X k+1 (u) = uX k+1 + (1 - u)X k , 

n+i = (X, +1 (o;) T X, +1 (a;))t(X, +1 (^) T ^), 

ife+iM = ^+1 + (i - 



where the weight uj > 1. One of its stopping criterions is the same as (20a). In 
our tests described below, we set tol = 10 -5 for Alg[l]and LMaFit and chose 
different maximum numbers of iterations based on the size of recovered matrix, 
which will be specified below. We applied the rank-estimation technique coming 
with LMaFit (hence, we did not fix q for LMaFit). 
FPCA solves convex problems in the form of 

mm f i\\X\U + ^(X)-b\\l 

which includes Q as a special case by setting the linear operator si to SPq. 
Introducing h(X) = si*(si(X) — 6), where si* is the adjoint of si , we can 
write the iteration of FPCA as 

Yk <- X k -rh(X k ), 
X k +i ^— S Tfl (Y k ), 

where S u (-) is a matrix singular-value shrinkage operator. In our tests described 
below, the parameters for FPCA were set to their default values: specifically, 
tol = 10 -6 and maxiter = 10 5 . For the default values of other parameters such 
as r and /i, we refer the reader to [21J. 



3.4 Hyperspectral Data Recovery 

In this subsection, we compare Algorithm [l] with LMaFit [29] and FPCA [21] on 
recovering three-dimensional hyperspectral images from their incomplete obser- 
vations. Hyperspectral (or multispectral) imaging is widely used in applications 
from environmental studies and biomedical imaging to military surveillance. A 
hyperspectral image is a three-dimensional datacube that records the electro- 
magnetic reflectance of a scene at varying wavelengths, from which different 
materials in the scene can be identified by exploiting their electromagnetic 
scattering patterns. We let each hyperspectral datacube be represented by a 
three-dimensional array whose first two dimensions are spatial and third di- 
mension is wavelength. A hyperspectral datacube can have several hundreds of 
wavelengths (along the third dimension) but no more than a dozen dominant 
materials. As a consequence, the spectral vector at every spatial location can 
be (approximately) linearly expressed by a small set of common vectors, called 
endmembers or spectral signatures of materials. The number of these basic vec- 
tors is much smaller than the number of wavelengths. Since endmembers are 
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naturally nonnegative, a hyperspectral datacube is a set of nonnegative mix- 
tures of a few endmembers, which are also nonnegative. This property makes it 
possible to recover the endmembers and mixture coefficients from a hyperspec- 
tral datacube, and it is called unmixing. Although unmixing is not as simple as 
NMF, the results of NMF can be used as an initial guess. Compared to NMF, 
NMFC not only performs initial unmixing but also recovers the datacube from 
an incomplete set of observed voxels. This advantage will translate to shorter 
sampling times and perhaps simpler designs of hyperspectral imaging devices. 

In our simulation, the hyperspectral datacube has 163 wavelengths or slices, 
and the size of each slice is 80 x 80. Three selected slices are shown in figure 
[2| They depict an urban area at three different wavelengths. Roads, roofs, 
plants, as well as other objects exhibit different intensities. Our simulation 
begin with reshaping the 80 x 80 x 163 hyperspectral datacube to a 6400 x 163 
matrix M, each slice becoming one column of M. While M is full rank, its 
singular values are fast decaying. We chose the estimate rank q — 30, and set 
tol=10 -5 and maxiter = 2000 for Algorithm [TJ and tol=10 -5 and maxiter = 
2000, est_rank=2, rk_inc =3 for LMaFit. The parameters for FPCA were set 
to their default values. 




Figure 2: Original slices of hyperspectral cube 



The three algorithms were compared on recovering M from incomplete ob- 
servations of SR = 30%, 40%, 50%, and their results were compared in terms of 
peak signal-to- noise ratio (PSNR), as well as mean squared error (MSE). Specif- 
ically, given a recovered matrix M from incomplete samples of M E 
let 

MSE := —\\M-M\\%, 
mn 



( MAXi \ 

VVmse/ 



PSNR:=201og 10 

where MAXj is the maximum pixel intensity, which is 1023 in this subsection for 



the tested hyperspectral data and 1 in subsection |3.5| for two grayscale images. 
The results are listed in table [TJ and the three slices of the recovered datacube 
that correspond to those in figure [2] are depicted in figure [3j The results show 
that Algorithm [T] performs better than FPCA in both CPU time and recovery 
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Table 1: real data: recovered slices by Algorithm [T] LMaFit, and FPCA. The rank estimate 
for Algorithm [l] and LMaFit is 30. 



problem 


Alg 1 


LMaFit 


FPCA 


seed 


CPU PSNR MSE 


CPU PSNR MSE 


CPU PSNR MSE 


SR: 30% 


3445 


27.15 47.71 1.77e+001 


14.80 45.05 3.27e+001 


39.39 43.31 4.89e+001 


31710 


26.38 47.52 1.85e+001 


31.35 43.08 5.15e+001 


39.19 43.56 4.61e+001 


43875 


27.45 47.50 1.86e+001 


40.79 42.26 6.23e+001 


38.49 44.35 3.84e+001 


69483 


25.66 47.71 1.77e+001 


42.07 42.20 6.31e+001 


39.04 44.20 3.98e+001 


95023 


25.67 47.48 1.87e+001 


32.21 43.02 5.22e+001 


39.14 43.13 5.10e+001 


SR: 40% 


3445 


28.46 48.89 1.35e+001 


27.51 44.96 3.34e+001 


42.85 44.92 3.37e+001 


31710 


28.66 49.00 1.32e+001 


25.99 45.70 2.82e+001 


42.96 44.92 3.37e+001 


43875 


29.81 48.88 1.36e+001 


21.89 45.09 3.24e+001 


43.32 44.48 3.73e+001 


69483 


28.24 48.86 1.36e+001 


21.38 45.26 3.12e+001 


43.99 44.26 3.92e+001 


95023 


29.10 48.73 1.40e+001 


19.31 45.95 2.66e+001 


43.87 44.79 3.47e+001 


SR: 50% 


3445 


30.71 49.73 l.lle+001 


34.78 44.69 3.55e+001 


47.62 44.43 3.77e+001 


31710 


30.39 49.92 1.07e+001 


22.75 46.21 2.50e+001 


46.64 43.69 4.47e+001 


43875 


31.34 49.74 l.lle+001 


22.69 46.47 2.36e+001 


46.77 44.20 3.98e+001 


69483 


30.36 49.98 1.05e+001 


29.84 45.00 3.31e+001 


47.15 44.15 4.02e+001 


95023 


30.27 49.83 1.09e+001 


47.71 44.09 4.08e+001 


47.63 44.36 3.84e+001 



quality. LMaFit is comparable with algorithm [T] in terms of speed but less 
accurate. We believe that the use of nonnegativity is a major factor for the 
superiority of the results of algorithm [I] 

3.5 Tests on images 

Despite that natural image recovery from incomplete random samples is not a 
typical image processing task, we picked it to test algorithm [TJ LMaFit, and 
FPCA since it is easy to visualize their solution qualities. This simulation used 
two grayscale images, the 768 x 1024 Kittens and the 1200 x 1600 Panda, shown 
in figure [I| 

We applied relatively small (thus, challenging) sample rates of SR=10%, 
20%, 30% for Kittens and SR=10%, 15%, 20% for Panda. We set tol=10- 5 , 
and maxiter=2000 for Algorithm [T] and LMaFit and est_rank=2, rk_inc =3 for 
LMaFit. The parameters for FPCA were set to their default values. The results 
are given in tables [2] and [3] and the recovered images in figures [5] and [6j 

Tables [2] and [3] indicate that FPCA performs slightly better than algorithm 
[I] in terms of recovery quality but slower when SR is as small as 10% while 
at this SR, LMaFit performs much worse. With larger SRs such as 20% and 
30% for Kittens and SR=15% and 20% for Panda, algorithm [I] is both faster 
and returns better images than FPCA. With SR=20% and 30%, algorithm [l] 
is better than LMaFit on Kittens in terms of recovery quality with comparable 
speed. However, with SR=20%, LMaFit becomes slightly faster than algorithm 
[T] on Panda with comparable recovery quality. As SR further increases, the 
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Figure 3: Recovered slices by Algor it hm[l] (first rows), LMaFit (second rows), and FPCA (last 
rows), respectively; rank estimate for Algorithm IT] and LMaFit is 30 




(a) SR = 30% (b) SR = 40% (c) SR = 50% 





Figure 4: Original images: Kittens (left) and Panda (right) 
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Table 2: Recover Kittens by Algorithm [T] LMaFit, and FPCA. The rank estimate for Algo- 
rithm [T] and LMaFit is 40. 



problem 


Ai g m 


LMaFit 


FPCA 


seed 


CPU PSNR MSE 


CPU PSNR MSE 


CPU PSNR MSE 


SR: 10% 


3445 


20.98 18.21 1.51e-002 


42.96 13.34 4.64e-002 


23.01 20.09 9.80e-003 


31710 


19.00 18.15 1.53e-002 


17.34 14.58 3.48e-002 


23.30 20.12 9.72e-003 


43875 


20.03 18.07 1.56e-002 


39.95 13.37 4.60e-002 


23.40 20.18 9.59e-003 


69483 


20.33 18.09 1.55e-002 


13.60 15.14 3.06e-002 


23.04 20.07 9.84e-003 


95023 


20.06 18.04 1.57e-002 


17.89 14.37 3.66e-002 


23.25 20.06 9.87e-003 


SR: 20% 


3445 


12.66 23.26 4.72e-003 


11.88 21.50 7.08e-003 


36.73 22.38 5.78e-003 


31710 


9.68 23.15 4.84e-003 


9.93 21.94 6.39e-003 


33.92 22.25 5.95e-003 


43875 


9.78 23.19 4.80e-003 


13.01 21.37 7.30e-003 


34.10 22.26 5.94e-003 


69483 


10.15 23.12 4.87e-003 


27.82 19.90 1.02e-002 


34.20 22.38 5.78e-003 


95023 


9.89 23.17 4.82e-003 


10.20 21.49 7.10e-003 


34.01 22.31 5.87e-003 


SR: 30% 


3445 


9.63 24.53 3.53e-003 


9.77 24.11 3.88e-003 


54.65 23.44 4.53e-003 


31710 


7.98 24.48 3.56e-003 


8.35 24.10 3.89e-003 


54.64 23.30 4.68e-003 


43875 


9.81 24.48 3.57e-003 


12.54 24.03 3.95e-003 


55.20 23.47 4.50e-003 


69483 


7.91 24.45 3.59e-003 


5.54 23.78 4.19e-003 


54.83 23.40 4.57e-003 


95023 


8.64 24.46 3.58e-003 


9.33 24.06 3.92e-003 


54.45 23.33 4.65e-003 



three algorithms will return images with almost the same quality while LMaFit 
is the best in speed. 



4 Conclusions 

Among wide applications of nonnegative matrix factorization and those of low- 
rank matrix completion, there is a rich subset of problems where data matri- 
ces can be well approximated by matrix factorizations that are both low-rank 
and nonnegative, while some of the data (matrix elements) are missing. To 
best recover missing data, we propose to combine nonnegative matrix factor- 
ization and matrix completion, utilizing both nonnegativity and low-rankness 
in a date recovery formulation. This paper presents our first attempt to solve 
this non-convex formulation using an algorithm based on the classic alternating 
direction augmented Lagrangian method. The algorithm has a relatively low 
per-iteration complexity, especially when the approximation rank is low. Ex- 
tensive numerical results in this paper indicate that the underlying formulation 
is useful, and the performance of the alternating direction algorithm is satisfac- 
tory. Since global convergence and recovery guarantee results are still largely 
unknown, we hope that the results of this paper will also motivate further 
theoretical and numerical studies on this useful problem. 
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Figure 5: Recovered 768 x 1024 Kittens with estimate rank q = 40 by Algorithm [T] (left), 
LMaFit (middle), and FPCA (right) 




Alg1SR = 0.1 LMaFit SR = 0.1 FPCA SR = 0.1 




Alg 1 SR = 0.2 LMaFit SR = 0.2 FPCA SR = 0.2 




Alg 1 SR = 0.3 



LMaFit SR = 0.3 



FPCA SR = 0.3 
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Table 3: Panda: recovered images by Algorithm [T] LMaFit, and FPCA. The rank estimate 
for Algorithm [l] and LMaFit is 40. 



problem 


Alg0 


LMaFit 


FPCA 


seed 


CPU PSNR MSE 


CPU PSNR MSE 


CPU PSNR MSE 


SR: 10% 


3445 


45.40 23.57 4.40e-003 


56.25 18.92 1.28e-002 


59.07 23.51 4.46e-003 


31710 


48.27 23.34 4.64e-003 


66.37 18.55 1.40e-002 


59.74 23.66 4.30e-003 


43875 


44.12 23.55 4.42e-003 


50.24 19.46 1.13e-002 


59.28 23.67 4.29e-003 


69483 


51.63 23.77 4.20e-003 


27.67 20.77 8.37e-003 


59.06 23.63 4.34e-003 


95023 


57.91 23.46 4.51e-003 


33.41 19.67 1.08e-002 


59.80 23.71 4.26e-003 


SR: 15% 


3445 


32.97 25.83 2.61e-003 


40.07 24.79 3.32e-003 


73.32 25.26 2.98e-003 


31710 


29.51 25.74 2.67e-003 


37.09 24.79 3.32e-003 


72.35 25.15 3.06e-003 


43875 


32.57 25.84 2.61e-003 


15.12 25.20 3.02e-003 


72.60 25.14 3.06e-003 


69483 


29.34 25.80 2.63e-003 


31.99 24.84 3.28e-003 


72.85 25.13 3.07e-003 


95023 


26.69 25.80 2.63e-003 


17.87 25.20 3.02e-003 


71.65 25.03 3.14e-003 


SR: 20% 


3445 


28.45 26.55 2.21e-003 


30.25 26.58 2.20e-003 


91.00 25.81 2.62e-003 


31710 


25.79 26.54 2.22e-003 


17.71 26.46 2.26e-003 


91.23 25.87 2.59e-003 


43875 


30.69 26.56 2.21e-003 


20.51 26.27 2.36e-003 


90.20 25.73 2.67e-003 


69483 


23.13 26.56 2.21e-003 


22.06 26.42 2.28e-003 


90.40 25.85 2.60e-003 


95023 


31.05 26.59 2.19e-003 


27.80 26.58 2.20e-003 


90.67 25.79 2.64e-003 
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